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Abstract. We provide an overview of the hybrid compositional distribu- 
tional model of meaning, developed in [HJ , which is based on the categor- 
ical methods also applied to the analysis of information flow in quantum 
protocols. The mathematical setting stipulates that the meaning of a 
sentence is a linear function of the tensor products of the meanings of its 
words. We provide concrete constructions for this definition and present 
techniques to build vector spaces for meaning vectors of words, as well as 
that of sentences. The applicability of these methods is demonstrated via 
a toy vector space as well as real data from the British National Corpus 
and two disambiguation experiments. 
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1 Introduction 

Words are the building blocks of sentences, yet the meaning of a sentence goes 
well beyond the meanings of its words. Indeed, while we do have dictionaries for 
words, we don't seem to need them to infer meanings of sentences. But where 
human beings seem comfortable doing this, machines fail to deliver. Automated 
search engines that perform well when queried by single words, fail to shine when 
it comes to search for meanings of phrases and sentences. Discovering the process 
of meaning assignment in natural language is among the most challenging as 
well as foundational questions of linguistics and computer science. The findings 
thereof will increase our understanding of cognition and intelligence and will also 
assist in applications to automating language-related tasks such as document 
search. 

To date, the compositional type-logical |17ll3j and the distributional vector 
space models |21l8j have provided two complementary partial solutions to the 
question. The logical approach is based on classic ideas from mathematical logic, 
mainly Frege's principle that meaning of a sentence can be derived from the 
relations of the words in it. The distributional model is more recent, it can be 
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related to Wittgenstein's philosophy of 'meaning as use', whereby meanings of 
words can be determined from their context. The logical models have been the 
champions of the theory side, but in practice their distributional rivals have 
provided the best predictions. 

In a cross-disciplinary approach, [B] used techniques from logic, category 
theory, and quantum information to develop a compositional distributional se- 
mantics that brought the above two models together. They developed a hybrid 
categorical model which paired contextual meaning with grammatical form and 
defined meaning of a string of words to be a function of the tensor product 
of the meanings of its words. As a result, meanings of sentences became vec- 
tors which lived in the same vector space and it became possible to measure 
their synonymity the same way lexical synonymity was measured in the distri- 
butional models. This sentence space was taken to be an abstract space and it 
was only shown how to instantiate it for the truth-functional meaning. Later [9] 
introduced a concrete construction using structured vector spaces and exempli- 
fied the application of logical methods, albeit only a toy vector space. In this 
paper we report on this and on a second construction which uses plain vector 
spaces. We also review results on implementing and evaluating the setting on 
real large scale data from the British National Corpus and two disambiguation 
experiments [lOj . 



2 Sketching the problem and a hybrid solution 

To compute the meaning of a sentence consisting of n words, meanings of these 
words must interact with one another. In the logical models of meaning, this 
further interaction is represented in a function computed from the grammatical 
structure of the sentence, but meanings of words are empty entities. The gram- 
matical structure is usually depicted as a parse-tree, for instance the parse-tree 
of the transitive sentence 'dogs chase cats' is as follows: 



chase(dogs, cats) 




dogs Aa;.chase(a;, |cats) 




cats Xyx.ch.ase(x,y) 
The function corresponding to this tree is based on a relational reading of the 
meaning of the verb 'chase', which makes the subject and the object interact with 
each other via the relation of chasing. This methodology is used to translate 
sentences of natural language into logical formulae, then use computer-aided 
automation tools to reason about them [2]. The major drawback is that the 
result can only deal with truth or falsity as the meaning of a sentence and does 
poorly on lexical semantics, hence do not perform well on language tasks such 
as search. 

The vector space model, on the other hand, dismisses the further interaction 
and is solely based on lexical semantics. These are obtained in an operational 
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way, best described by a frequently cited quotation due to Firth ^ that "You 
shall know a word by the company it keeps." . For instance, beer and sherry are 
both drinks, alcoholic, and often make you drunk. These facts are reflected in 
the text: words 'beer' and 'sherry' occur close to 'drink', 'alcoholic' and 'drunk'. 
Hence meanings of words can be encoded as vectors in a highly dimensional 
space of context words. The raw weight in each base is related to the num- 
ber of times the word has appeared close (in an n-word window) to that base. 
This setting offers geometric means to reason about meaning similarity, e.g. via 
the cosine of the angle between the vectors. Computational models along these 
lines have been built using large vector spaces (tens of thousands of basis vec- 
tors) and large bodies of text (up to a billion words) [7]. These models have 
responded well to language processing tasks such as word sense discrimination, 
thesaurus construction, and document retrieval jlll21) . Their major drawback 
is their non-compositional nature: they ignore the grammatical structure and 
logical words, hence cannot compute (in the same efficient way that they do for 
words) meanings of phrases and sentences. 

The key idea behind the approach of [5] is to import the compositional el- 
ement of the logical approaches into the vector space models by making the 
grammar of the sentence act on, hence relate, its word vectors. The trouble is 
that it does not make so much sense to 'make a parse tree act on vectors'. Some 
higher order mathematics, in this case category theory, is needed to encode the 
grammar of a sentence into a morphism compatible with vector spacetj^ These 
morphisms turn out to be the grammatical reductions of a type-logic called a 
Lambek pregroup 13J. Pregroups and vector spaces both have a compact cate- 
gorical structural. The grammatical morphism of a pregroup can be transformed 
into a linear map that acts on vectors. Meanings of sentences become vectors 
whose angles reflect similarity. Hence, at least theoretically, one should be able 
to build sentence vectors and compare their synonymity, in exactly the same 
way as measuring synonymity for words. 

The pragmatic interpretation of this abstract idea is as follows. In the vector 
space models, one has a meaning vector for each word, dogs, chase, cats. The 
logical recipe tells us to apply the meaning of verb to the meanings of subject 
and object. But how can a vector apply to other vectors? If we strip the vectors 
off the extra information provided in their basis and look at them as mere sets 
of weights, then we can apply them to each other by taking their point-wise 
sum or product. But these operations are commutative, whereas meaning is not. 
Hence this will equalize meaning of any combination of words, even with the 
non-grammatical combinations such as 'dogs cats chase'. The proposed solution 
above implies that one needs to have different levels of meaning for words with 
different functionalities. This is similar to the logical models whereby verbs are 
relations and nouns are atomic sets. So verb vectors should be built differently 
from noun vectors, for instance as matrices that relate and act on the atomic 
noun vectors. The general information, as to which words should be matrices 

^ A similar passage had to be made in other type-logics to turn the parse-trees into 
lambda terms, compatible with sets and relations. 
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and which atomic vectors, is in fact encoded in the type-logical representation of 
the grammar. That is why the grammatical structure of the sentence is a good 
candidate for the process that relates its word vectors. 

In a nutshell, pregroup types are either atomic or compound. Atomic types 
can be simple {e.g. n for noun phrases, s for statements) or left/right superscripted — 
referred to as adjoint types {e.g. rf and n'). An example of a compound type is 
that of a verb nT sv} . The superscripted types express that the verb is a relation 
with two arguments of type n, which have to occur to the right and to the left 
of it, and that it outputs an argument of the type s. A transitive sentence is 
typed as shown below. 



Here, the verb interacts with the subject and object via the underlying wire 
cups, then produces a sentence via the outgoing line. These interactions happen 
in real time. The type-logical analysis assigns type n to 'dogs' and 'cats', for a 
noun phrase, and the type nT sv} to 'chase' for a verb, the superscripted types rf 
and n' express the fact that the verb is a function with two arguments of type 
n, which have to occur to the right and /eft of it. The reduction computation 
is nn^sn} < Isl = s, each type n cancels out with its right adjoint rf from the 
right, i.e. nrf < 1 and its left adjoint from the left, i.e. n'n < 1, and 1 is the 
unit of concatenation In = nl = n. The algebra advocates a linear method of 
parsing: a sentence is analyzed as it is heard, i.e. word by word, rather than by 
first buffering the entire string then re-adjusting it as necessary on a tree. It's 
been argued that the brain works in this one-dimensional linear (rather than 
two-dimensional tree) manner [131 . 

According to [6, and based on a general completeness theorem between com- 
pact categories, wire diagrams, and vector spaces, meaning of sentences can be 
canonically reduced to linear algebraic formulae, for example the following is the 
meaning vector of our transitive sentence: 



Here / is the linear map that encodes the grammatical structure. The categorical 
morphism corresponding to it is denoted by the tensor product of 3 components: 
(8) Is (8i ew, where V and W are subject and object spaces, S is the sentence 
space, the e's are the cups, and Is is the straight line in the diagram. The 
cups stand for taking inner products, which when done with the basis vectors 
imitate substitution. The straight line stands for the identity map that does 
nothing. By the rules of the category, the above equation reduces to the following 
linear algebraic formula with lower dimensions, hence the dimensional explosion 
problem for tensor products is avoided: 



dogs chase cats. 

n rf s ii} n 




dogs chase cats — (/) I dogs ® chase ® cats 
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itj 

In the above equation, -St, w^j are basis vectors of V and W . The meaning of the 
verb becomes a superposition, represented as a linear map. The inner product 
(dogs|t't) substitutes the weights of dogs into the first argument place of the 
verb (similarly for object and second argument place) and results in producing a 
vector for the meaning of the sentence. These vectors live in sentence spaces S, 
for which is a base vector. The degree of synonymity of sentences is obtained 
by taking the cosine measure of their vectors. S is an abstract space, it needs 
to be instantiated to provide concrete meanings and synonymity measures. For 
instance, a truth-theoretic model is obtained by taking the sentence space S to 
be the 2-dimensional space with basis vector true |1) and false |0). This is done 
by using the weighting factor Cf^j^° to define a model-theoretic meaning for the 
verb as follows: 



The definition of our meaning map ensures that this value propagates to the 
meaning of the whole sentence. So chase{dogs, cats) becomes true whenever 'dogs 
chase cats' is true and false otherwise. 

3 Two Concrete Constructions for Sentence Spaces 



The abov e co nstruction is based on the assumptions that dogs is a base of V 
and that cats is a base of W. In other words, we assume that V is the vector 
space spanned by the set of all men and W is the vector space spanned by the 
set of all women. This is not the usual construction in the distributional models. 
In what follows we present two concrete constructions for these, which will then 
yield a construction for the sentence space. In both of these approaches V and 
W will be the same vector space, which we will denote by TV. 

3.1 Structured Vector Spaces and a Toy Corpus 

We take iV to be a structured vector space, as in ^llj . The bases of N are anno- 
tated by 'properties' obtained by combining dependency relations with nouns, 
verbs and adjectives. For example, basis vectors might be associated with prop- 
erties such as "arg-fluffy" , denoting the argument of the adjective fluffy, "subj- 
chase" denoting the subject of the verb chase, "obj-buy" denoting the object of 
the verb buy, and so on. We construct the vector for a noun by counting how 
many times in the corpus a word has been the argument of 'fluffy', the subject 
of 'chase', the object of 'buy', and so on. 

For transitive sentences, we take the sentence space S to he N (E) N , so its 
bases are of the form St = (rt^^rij). The intuition is that, for a transitive verb, 
the meaning of a sentence is determined by the meaning of the verb together 
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with its subject and object. The verb vectors CJ^J^ln^, Uj) are built by counting 
how many times a word that is rij (e.g. has the property of being fluffy) has 
been subject of the verb and a word that is rij (e.g. has the property that it's 
bought) has been its object, where the counts are moderated by the extent 
to which the subject and object exemphfy each property (e.g. how fluffy the 
subject is). To give a rough paraphrase of the intuition behind this approach, 
the meaning of "dog chases cat" is given by: the extent to which a dog is flufi'y and 
a cat is something that is bought (for the ® N property pair "arg-fluffy" and 
"obj-buy"), and the extent to which fluffy things chase things that are bought 
(accounting for the meaning of the verb for this particular property pair); plus 
the extent to which a dog is something that runs and a cat is something that is 
cute (for the N ® N pair "subj-run" and "arg-cute"), and the extent to which 
things that run chase things that are cute (accounting for the meaning of the 
verb for this particular property pair); and so on for all noun propcirty pairs. 

For sentences with intransitive verbs, the sentence space suffices to be just 
N . To compare the meaning of a transitive sentence with an intransitive one, 
we embed the meaning of the latter from into the former N ® N , by taking 
£^ (the 'object' of an intransitive verb) to be Ylii ^) i-^- the superposition of all 
basis vectors of N. A similar method is used while dealing with sentences with 
ditransitive verbs, where the sentence space will he N ^ N (S) N , since these verbs 
have three arguments. Transitive and intransitive sentences are then embedded 
in this bigger space, using the same embedding described above. 

Adjectives are dealt with in a similar way. We give them the syntactic type 
nn} and build their vectors in iV (g) iV. The syntactic reduction nn^n — >■ n asso- 
ciated with applying an adjective to a noun gives us the map In (^ejsr by which 
we semantically compose an adjective with a noun, as follows: 

adjective noun = (Ijv (S) ejv)(adj (8) nouA) = C!^^'r^i{nj \ nouA) 

ij 

We can view the counts as determining what sorts of properties the argu- 
ments of a particular adjective typically have (e.g. arg-red, arg-colourful for the 

adjective "red"). 

As an example, consider a hypothetical vector space with bases 'arg-fluffy', 
'arg-ferocious', 'obj-buys', 'arg-shrewd', 'arg-valuable', with vectors for 'bankers', 
'cats', 'dogs', 'stock', and 'kittens'. 







bankers cats dogs 


stock kittens 


1 


arg-fluffy 





7 


3 


2 


2 


arg-ferocious 


4 


1 


6 





3 


obj-buys 





4 


2 


7 


4 


arg-shrewd 


6 


3 


1 


1 


5 


arg-\aluabl(> 


() 


1 


2 


8 



Since in the method proposed above, C^fJ^ = if St ^ {ni,nj), we can simplify 
the weight matrices for transitive verbs to two dimensional CJ?'''' matrices as 
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shown below, where C™''' corresponds to the number of times the verb has 
a subject with attribute rii and an object with attribute rij. For example, the 
matrix below encodes the fact that something ferocious [i — 2) chases something 
fluffy {j = 1) seven times in the hypothetical corpus from which we might have 
obtained these distributions. 

"l 
7 12 3 1 


2 10 1 

1 



'chasc 



Once we have built matrices for verbs, we are able to follow the categorical 
procedure and automatically build vectors for sentences, then perform sentence 
comparisons. The comparison is done in the same way as for lexical semantics, 
i.e. by taking the inner product of the vectors of two sentences and normalizing it 
by the product of their lengths. For example the following shows a high similarity 



cos(dogs chase cats, dogs pursue kittens) 



(dogs chase cats | dogs pursue kittens) 



dogs chase cats | x | dogs pursue kittens 



(E.t, ci^r^^^ I ^)^(^ I ^s)) I (e,*, cf.r"'^ (d^s I I kittens)) ) 



dogs chase cats | x | dogs pursue kittens 



E.t, Cf,'5-^^<=CP7™°(dogs I ^^)(dogs I 7^)(^ I cats)(^^ | kittens) _ 

— — ^ y — o.y/y 

I dogs chase cats | x | dogs pursue kittens | 
A similar computation will provide us with the following, demonstrating a low 
similarity 

cos((dogs chase cats | bankers sell stock)) — 0.042 

The construction for adjective matrices are similar: we stipulate the C^^^ 
matrices by hand and eliminate all cases where i ^ j since C^- = 0, hence these 
become one dimensional matrices. Here is an example 

^fluffy ^[9 3 4 2 2] 

Vectors for 'adjective noun' clauses are computed similarly and are used to com- 
pute the following similarity measures: 



cosme(fluffy dog, shrewd banker) = 0.389 



cosme(fluffy cat, valuable stock) = 0.184 

These calculations carry over to sentences which contain the 'adjective noun' 
clauses. For instance, we obtain an even lower similarity measure between the 
following sentences: 



cosme(fluffy dogs chase fluffy cats, shrewd bankers sell valuable stock) — 0.016 

Other constructs such as prepositional phrases and adverbs are treated similarly, 
see [9]. 
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3.2 Plain Vector Spaces and the BNC 

The above concrete example is fine grained, but involves complex constructions 
which are time and space costly when implemented. To be able to evaluate the 
setting against real large scale data, we simplified it by taking iV to be a plain 
vector spaces whose bases are words, without annotations. The weighting factor 
CJj'^^ is determined in the same as above, but this time by just counting co- 
occurence rather than being arguments of syntactic roles. More precisely, this 
weight is determined by the number of times the subjects of the verb have 
co-occured with the base r^j. In the previous construction we went beyond co- 
occurence and required that the subject (similarly for the object) should be in 
a certain relation with the verb, for instance if iti was 'arg-flufHy', the subject 
had to be an argument of fluffy, where as here we instead have iti = 'fluffy', and 
the subject has to co-occure with 'fluffy' rather than being directly modifled by 
it. 

The procedure for computing these weights for the case of transitive sentences 
is as follows: first browse the corpus to find all occurrences of the verb in question, 
suppose it has occurred as a transitive verb in k sentences. For each sentence 
determine the subject and the object of the verb. Build vectors for each of these 
using the usual distributional method. Multiply their weights on all permutations 
of their coordinates and then take the sum of each such multiplication across 
each of the k sentences. Linear algebraically, this is just the sum of the Kronecker 
products of the vectors of subjects and objects: 




Recall that given a vector space A with basis the Kronecker product of 

two vectors if = ^ ■ c^n^ and 1^ ~ ^ ■ c^nl is defined as follows: 

ij 

As an example, we worked with the British National Corpus (BNC) which 
has about 6 million sentences. We built noun vectors and computed matrices 

for intransitive verbs, transitive verbs, and adjectives. For instance, consider 
to be the space with four basis vectors 'far', 'room', 'scientific', and 'elect'; 
the (TF/IDF) values for vectors of the four nouns 'table', 'map', 'result', and 
'location' are shown below. 

A section of the matrix of the transitive verb 'show' is represented below. 

As a sample computation, suppose the verb 'show' only appears in two sentences 
in the corpuse: 'the map showed the location' and 'the table showed the result'. 
The weight Ci2 for the base i.e. (far, far) is computed by multiplying weights of 
'table' and 'result' on far, i.e. 6.6 x 7, multiplying weights of 'map' and 'location' 
on far, i.e. 5.6 x 5.9 then adding these 46.2 + 33.04 and obtaining the total weight 
79.24. 
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i 




table 


map 


result 


location 


1 


far 


6.6 


5.6 


7 


5.9 


2 


room 


27 


7.4 


0.99 


7.3 


3 


scientific 





5.4 


13 


6.1 


4 


elect 








4.2 






Table 1. Sample noun vectors from the BNC. 





far 


room 


scientific 


elect 


far 


79.24 


47.41 


119.96 


27.72 


room 


232.66 


80.75 


396.14 


113.2 


scientific 


32.94 


31.86 


32.94 





elect 















Table 2. Sample verb matix from the BNC. 



The computations for building vectors for sentences and other phrases are 
the same as in the case for structured vector spaces. The matrix of a transitive 
verb has 2 dimensions since it takes as input two arguments. The same method 
is applied to build matrices for ditransitive verbs, which will have 3 dimensions, 
and intransitive verbs, as well as adjectives and adverbs, which will be of 1 
dimension each. 

4 Evaluation and Experiments 

We evaluated our second concrete method on a disambiguation task and per- 
formed two experiments [lU]. The general idea behind this disambiguation task 
is that some verbs have different meanings and the context in which they appear 
is used to disambiguate them. For instance the verb 'show' can mean 'express' 
in the context 'the table showed the result' or it can mean 'picture', in the con- 
text 'the map showed the location'. Hence if we build meaning vectors for these 
sentences compositionally, the degrees of synonymity of the sentences can be 
used to disambiguate the meaning of the verb in that sentence. Suppose a verb 
has two meanings and it has occurred in two sentences. Then if in both of these 
sentences it has its meaning number 1, the two sentences will have a high degree 
of synonymity, whereas if in one sentence the verb has its meaning number 1 
and in the other its meaning number 2, the sentences will have a lower degree of 
synonymity. For instance, 'the table showed the result' and 'the table expressed 
the result', have a hight degree of synonymity and similarly for 'the map showed 
the location' and 'the map pictured the location'. This degree decreases for the 
two sentences 'the table showed the result' and 'the table pictured the result'. 
We used our second concrete construction to implement this task. 

The data set for our first experiment was developed by [TO] and had 120 sen- 
tence pairs. These were all intransitive sentences. We compared the results of our 
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method with composition operations implemented by |16j . these included addi- 
tion, multiplication, and a combination of two using weights. The best results 
were obtained by the multiplication operator. Our method provided slightly 
better results. However, the context provided by intransitive sentences is just 
one word, hence the results do not showcase the compositional abilities of our 
method. In particular, in such a small context, our method and the multiplica- 
tion method became very similar, hence the similarity of results did not surprise 
us. There is nevertheless two major differences: our method respects the gram- 
matical structure of the sentences (whereas the multiplication operation does 
not) and in our method the vector of the verb is computed differently from the 
vectors of the nouns: as a relation and via a second order construction. 

For the second experiment, we developed a data set of transitive sentences. 
We first picked 20 transitive verbs from the most occurring verbs of the BNC, 
each verb has at least two different non-overlapping meanings. These were re- 
trieved using the JCN (Jiang Conrath) information content synonymity measure 
of WordNet. The above example for 'show' and its two meanings 'express' and 
'picture' is one such example. For each such verb, e.g. 'show', we retrieved 10 
sentences which contained them (as verbs) from the BNC. An example of such 
a sentence is 'the table showed the result'. We then substituted in each sentence 
each of the two meanings of the verb, for instance 'the table expressed the result' 
and 'the table pictured the result'. This provided us with 400 pairs of sentences 
and we used the plain method described above to build vectors for each sentence 
and compute the cosine of each pair. A sample of these pairs is provided below. 





Sentence 1 


Sentence 2 


1 


table show result 


table express result 


2 


table show result 


table picture result 


3 


map show location 


map picture location 


4 


map show location 


map express location 


5 


child show interest 


child picture interest 


6 


child show interest 


child express interest 



Table 3. Sample sentence pairs from the second experiment dataset. 



In order to judge the performance of our method, we followed guidelines 
from [TB]. We distributed our data set among 25 volunteers who were asked to 
rank each pair based on how similar they thought they were. The ranking was 
between 1 and 7, where 1 was almost dissimilar and 7 almost identical. Each 
pair was also given a HIGH or LOW classification by us. The correlation of the 
model's similarity judgements with the human judgements was calculated using 
Spearman's p, a metric which is deemed to be more scrupulous and ultimately 
that by which models should be ranked. It is assumed that inter-annotator agree- 
ment provides the theoretical maximum p for any model for this experiment, and 
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that taking the cosine measure of the verb vectors while ignoring the noun was 
taken as the baseline. 

The results for the models evaluated against the both datasets are presented 
below. The additive and multiplicative operations are applications of vector ad- 
dition and multiplication; Kintsch is a combination of the two, obtained by mul- 
tiplying the word vectors by certain weighting constants and then adding them, 
for details please see [16j . The Baseline is from a non-compositional approach, 
obtained by only comparing vectors of verbs of the sentences and ignoring their 
subjects and objects. The UpperBound is the summary of the human ratings, 
also known as inter- annotator agreement. 



Model 


High 


Low 


P 


Baseline 


0.27 


0.26 


0.08 


Add 


0.59 


0.59 


0.04 


Kintsch 


0.47 


0.45 


0.09 


Multiply 


0.42 


0.28 


0.17 


Categorical 0.84 


0.79 


0.17 


UpperBound 4.94 


3.25 


0.40 



Model High Low 


P 


Baseline 0.47 0.44 


0.16 


Add 0.90 0.90 
Multiply 0.67 0.59 
Categorical 0.73 0.72 


0.05 
0.17 
0.21 


UpperBound 4.80 2.49 


0.62 



Table 4. Results of the 1st and 2nd compositional disambiguation experiments. 



According to the literature (e.g. see [H]), the main measure of success is 
demonstrated by the p column. By this measure in the second experiment our 
method outperforms the other two with a much better margin than that in the 
first experiment. The High (similarly Low) columns are the average score that 
High (Low) similarity sentences (as decided by us) get by the program. These 
are not very indicative, as the difference between high mean and the low mean of 
the categorical model is much smaller than that of the both the baseline model 
and multiplicative model, despite better alignment with annotator judgements. 

The data set of the first experiment has a very simple syntactic structure 
where the context around the verb is just its subject. As a result, in practice 
the categorical method becomes very similar to the multiplicative one and the 
similar outcomes should not surprise us. The second experiment, on the other 
hand, has more syntactic structure, thereby our categorical shows an increase 
in alignment with human judgements. Finally, the increase of p from the first 
experiment to the second reflects the compositionality of our model: its perfor- 
mance increases with the increase in syntactic complexity. Based on this, we 
would like to believe that more complex datasets and experiments which for 
example include adjectives and adverbs shall lead to even better results. 
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5 Conclusion and Future Work 

We have provided a brief overview of the categorical compositional distributional 
model of meaning as developed in . This combines the logical and vector space 
models using the setting of compact closed categories and their diagrammatic 
toolkit and based on ideas presented in on the use of tensor product as a 
meaning composition operator. We go over two concrete constructions of the 
setting, show examples of one construction on a toy vector space and implement 
the other construction on the real data from the BNC. The latter is evaluated on 
a disambiguation task on two experiments: for intransitive verbs from .161 and 
for transitive verbs developed by us. The categorical model slightly improves the 
results of the first experiment and betters them in the second one. 

To draw a closer connection with the subject area of the workshop, we would 
like to recall that sentences of natural language are compound systems, whose 
meanings exceed the meanings of their parts. Compound systems are a phenom- 
ena studied by many sciences, findings thereof should as well provide valuable 
insights for natural language processing. In fact, some of the above observations 
and previous results were led by the use of compact categories in compound 
quantum systems W. The caps that connect subject and verb from afar are 
used to model nonlocal correlations in entangled Bell states; meanings of verbs 
are represented as superposed states that let the information flow between their 
subjects and objects and further act on it. Even on the level of single quantum 
systems, there are similarities to the distributional meanings of words: both are 
modeled using vector spaces. Motivated by this |19|22j have used the methods 
of quantum logic to provide logical and geometric structures for information re- 
trieval and have also obtained better results in practice. We hope and aim to 
study the modular extension of the quantum logic methods to tensor spaces of 
our approach. There are other approaches to natural language processing that 
use compound quantum systems but which do not focus on distributional mod- 
els, for example see [4 . 

Other areas of future work include creating and running more complex exper- 
iments that involve adjectives and adverbs, working with larger corpora such as 
the WaCKy, and interpreting stop words such as relative pronouns who, which, 
conjunctives and, or, and quantifiers every, some. 
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